Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Machine-printed and hand-written text lines identification

Identifieur interne : 001A32 ( Main/Exploration ); précédent : 001A31; suivant : 001A33

Machine-printed and hand-written text lines identification

Auteurs : U. Pal [Inde] ; Bidyut Baran Chaudhuri [Inde]

Source :

RBID : ISTEX:49C2E2049B28B4ABE1CDE28ABF0CA9DF3074F0E9

Abstract

There are many types of documents where machine-printed and hand-written texts intermixedly appear. Since the optical character recognition (OCR) methodologies for machine-printed and hand-written texts are different, to achieve optimal performance it is necessary to separate these two types of texts before feeding them to their respective OCR systems. In this paper, we present a machine-printed and hand-written text classification scheme for Bangla and Devnagari, the two most popular Indian scripts. The scheme is based on the structural and statistical features of the machine-printed and hand-written text lines. The classification scheme has an accuracy of 98.6%.

Url:
DOI: 10.1016/S0167-8655(00)00126-4


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Machine-printed and hand-written text lines identification</title>
<author>
<name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
</author>
<author>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation>
<country>Inde</country>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:49C2E2049B28B4ABE1CDE28ABF0CA9DF3074F0E9</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0167-8655(00)00126-4</idno>
<idno type="url">https://api.istex.fr/document/49C2E2049B28B4ABE1CDE28ABF0CA9DF3074F0E9/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000601</idno>
<idno type="wicri:Area/Istex/Curation">000593</idno>
<idno type="wicri:Area/Istex/Checkpoint">001086</idno>
<idno type="wicri:doubleKey">0167-8655:2001:Pal U:machine:printed:and</idno>
<idno type="wicri:Area/Main/Merge">001B25</idno>
<idno type="wicri:Area/Main/Curation">001A32</idno>
<idno type="wicri:Area/Main/Exploration">001A32</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">Machine-printed and hand-written text lines identification</title>
<author>
<name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, 203 B.T. Road, Calcutta 700 035</wicri:regionArea>
<wicri:noRegion>Calcutta 700 035</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Inde</country>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, 203 B.T. Road, Calcutta 700 035</wicri:regionArea>
<wicri:noRegion>Calcutta 700 035</wicri:noRegion>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Inde</country>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Pattern Recognition Letters</title>
<title level="j" type="abbrev">PATREC</title>
<idno type="ISSN">0167-8655</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">22</biblScope>
<biblScope unit="issue">3–4</biblScope>
<biblScope unit="page" from="431">431</biblScope>
<biblScope unit="page" to="441">441</biblScope>
</imprint>
<idno type="ISSN">0167-8655</idno>
</series>
<idno type="istex">49C2E2049B28B4ABE1CDE28ABF0CA9DF3074F0E9</idno>
<idno type="DOI">10.1016/S0167-8655(00)00126-4</idno>
<idno type="PII">S0167-8655(00)00126-4</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0167-8655</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">There are many types of documents where machine-printed and hand-written texts intermixedly appear. Since the optical character recognition (OCR) methodologies for machine-printed and hand-written texts are different, to achieve optimal performance it is necessary to separate these two types of texts before feeding them to their respective OCR systems. In this paper, we present a machine-printed and hand-written text classification scheme for Bangla and Devnagari, the two most popular Indian scripts. The scheme is based on the structural and statistical features of the machine-printed and hand-written text lines. The classification scheme has an accuracy of 98.6%.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Inde</li>
</country>
<region>
<li>Bengale-Occidental</li>
</region>
<settlement>
<li>Calcutta</li>
</settlement>
<orgName>
<li>Institut indien de statistiques</li>
</orgName>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
</noRegion>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A32 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001A32 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:49C2E2049B28B4ABE1CDE28ABF0CA9DF3074F0E9
   |texte=   Machine-printed and hand-written text lines identification
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024